Tractable Algorithms for Proximity Search on Large Graphs
نویسندگان
چکیده
Identifying the nearest neighbors of a node in a graph is a key ingredient in a diverse set of ranking problems, e.g. friend suggestion in social networks, keyword search in databases, web-spam detection etc. For finding these “near” neighbors, we need graph theoretic measures of similarity or proximity. Most popular graph-based similarity measures, e.g. length of shortest path, the number of common neighbors etc., look at the paths between two nodes in a graph. One such class of similarity measures arise from random walks. In the context of using these measures, we identify and address two important problems. First, we note that, while random walk based measures are useful, they are often hard to compute. Hence we focus on designing tractable algorithms for faster and better ranking using random walk based proximity measures in large graphs. Second, we theoretically justify why path-based similarity measures work so well in practice. For the first problem, we focus on improving the quality and speed of nearest neighbor search in real-world graphs. This work consists of three main components: first we present an algorithmic framework for computing nearest neighbors in truncated hitting and commute times, which are proximity measures based on short term random walks. Second, we improve upon this ranking by incorporating user feedback, which can counteract ambiguities in queries and data. Third, we address the problem of nearest neighbor search when the underlying graph is too large to fit in main memory. We also prove a number of interesting theoretical properties of these measures, which have been key to designing most of the algorithms in this thesis. We address the second problem by bringing together a well known generative model for link formation, and geometric intuitions. As a measure of the quality of ranking, we examine link prediction, which has been the primary tool for evaluating the algorithms in this thesis. Link prediction has been extensively studied in prior empirical surveys. Our work helps us better understand some common trends in the predictive performance of different measures seen across these empirical results.
منابع مشابه
META-HEURISTIC ALGORITHMS FOR MINIMIZING THE NUMBER OF CROSSING OF COMPLETE GRAPHS AND COMPLETE BIPARTITE GRAPHS
The minimum crossing number problem is among the oldest and most fundamental problems arising in the area of automatic graph drawing. In this paper, eight population-based meta-heuristic algorithms are utilized to tackle the minimum crossing number problem for two special types of graphs, namely complete graphs and complete bipartite graphs. A 2-page book drawing representation is employed for ...
متن کاملFast Algorithms for Proximity Search on Large Graphs
The main focus of this proposal is on understanding and analyzing entity relationships in large social networks. The broad range of applications of graph based learning problems includes collaborative filtering in recommender networks, link prediction in social networks (e.g. predicting future links from the current snapshot of a graph), fraud detection and personalized graph search. In all the...
متن کاملReverse Top-k Search using Random Walk with Restart
With the increasing popularity of social networks, large volumes of graph data are becoming available. Large graphs are also derived by structure extraction from relational, text, or scientific data (e.g., relational tuple networks, citation graphs, ontology networks, protein-protein interaction graphs). Node-to-node proximity is the key building block for many graph-based applications that sea...
متن کاملA Solution Merging Heuristic for the Steiner Problem in Graphs Using Tree Decompositions
Fixed parameter tractable algorithms for bounded treewidth are known to exist for a wide class of graph optimization problems. While most research in this area has been focused on exact algorithms, it is hard to find decompositions of treewidth sufficiently small to make these algorithms fast enough for practical use. Consequently, tree decomposition based algorithms have limited applicability ...
متن کاملKeyword Proximity Search on XML Graphs
XKeyword provides efficient keyword proximity queries on large XML graph databases. A query is simply a list of keywords and does not require any schema or query language knowledge for its formulation. XKeyword is built on a relational database and, hence, can accommodate very large graphs. Query evaluation is optimized by using the graph’s schema. In particular, XKeyword consists of two stages...
متن کامل